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ABSTRACT 



Feported is a formative evaluation of the Biological 
Science Curriculum Study "Biological Science: Patterns and 
Processes", designed for academically unsuccessful students. 
"Criterion referenced" tests were developed, with items selected to 
indicate the extent of students’ learning rather than to discriminate 
between students. An alternate form, pretest-posttest research design 
was used. Randomly selected students within classes of teachers who 
had participated in feedback and training activities were given 
alternate test forms for each of five content areas. Scores on these 
tests served as the dependent variables with scores on Verbal 
Reasoning and Numerical Ability subtests of the Differential Aptitude 
Test, and Davis Reading Test scores serving as independent variables. 
Data were also collected on school and community characteristics. 
Analysis of covariance and multiple regression analysis showed 
significant differences between classes (tentatively attributed to 
teacher performance), and significant correlations between reading 
comprehension and achievement. Recommendations are made for revision 
of the materials and for similar evaluative studies. Appended are 
tables of results and statistical analyses, and copies of tests used. 
(EB) 
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Chapter 1 



Background of the Study 



There is a large group of students in American schools that, for a 
variety of reasons, may be categorized as academically unsuccessful. 

Until recently, no concerted effort has been made to delineate the 
characteristics of this group and to prepare curricula! materials that 
provide academic successes while maintaining integrity of the content 
and developing its relevance for the student. In 1966, the Biological 
Sciences Curriculum Study commercially released a program in biological 
sciences for the academically unsuccessful entitled Biological Science ; 
Patterns and Processes . 

Based on feedback comments from teachers and students, the materials 
have been remarkably successful. However, no quantitative data exist to 
justify the claims of success for these materials and for the unique 
instructional procedures they entail. If this program is to serve as a 
model for curriculum development in other disciplines, critical and 
objective evaluations of the attainments of students taught with materials 
of this type are needed. 

The BSCS originally developed three parallel sets of course materials 
for high school biology: Biological Science : Molecules to Man (Blue 
Version) , High School Biology : BSCS Green Version , and Biological Science : 
An Inquiry Into Life (Yellow Version) . These materials were prepared by 
teams of writers working at Summer Writing Conferences during three 
successive years — 1960, 1961, and 1962. In the years following each of 
the first two summers' work, the materials were widely tested and reviewed 
to give feedback for rewriting. 

The BSCS became interested in pupils exhibiting poor achievement 
during the years the three BSCS Versions were being evaluated (1960-63). 

A Special Materials Committee was organized in 1962 to determine the 
characteristics of these students and to make recommendations a3 to how 
they might best be taught. The Committee examined and analyzed the 
literature on deprived youngsters, school dropouts, and students with 
learning problems. They interviewed teachers of the academically 
unsuccessful student and observed them with their classes. Data collected 
during the evaluation of the Versions were examined for criteria that 
could be used to predict student success. 

After an exhaustive study of all these data and materials, the 
Committee prepared a. plan for the development of materials in biology 
that could be expected to be more suitable for these academically 
unsuccessful students. The plan included: 

(a) writing the materials at a reading level in keeping with 
the students ' abilities , while keeping formal reading 
assignments at a minimum — to produce essentially an 
"unbook. " 
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(b) providing interesting activities within a framework of 
multisensory perceptions in the classroom situation. 

(c) constructing materials so that they would lead, in small 
steps, from one fact to another, eventually to a 
generalization and then to a new concept. 

(d) centering the learning situations around laboratory 
activities as much as possible in order to capitalize upon 
the potential interests and abilities of these students. 

(e) structuring the development of selected concepts common to 
all three of the BSCS Versions, but in a manner especially 
suitable to the characteristics of these students. 

(f) developing activities and procedures that served to 
demonstrate the role of inquiry in the accumulations of 
knowledge upon which the theories of modern biology are 
based. 

With these guidelines, BSCS writing teams, composed of high school 
teachers, college biologists, educational psychologists and science 
educators, proceeded to develop experimental materials that were used and 
evaluated in classroom situations by a total of 300 teachers and 15,000 
students. Several revisions were made prior to commercial publication. 
Teachers involved in the 1964-65 evaluation of these materials provided 
feedback used in the summer of 1965 to guide the final revision which is 
published commercially by Holt, Rinehart and Winston, Inc., under the 
title Biological Science : Patterns and Processes . 

Curriculum development projects typically rely upon very indirect 
data to evaluate instructional materials and teaching procedures . The 
prime data are the opinions of teachers. Usually the teacher is asked to 
evaluate rather large units of material and whole systems of concepts with 
a brief comment and a rating or two. It is not clear what criteria and 
what standards of excellence are being applied when a teacher judges a 
lesson to be successful or unsuccessful. Student interest may not be 
clearly delineated from student learning. The impressions of a teacher, 
as a participant-observer, may be unduly colored by the performance of a 
few students. Finally, even when there is consensus that a lesson was 
unsuccessful, teacher evaluations will not always be helpful in 
identifying the gaps in student understanding that must be filled to make 
the lesson a success. To be sure, teacher judgments do provide important 
information, but they cannot reasonably bear the whole burden of 
identifying the strengths and weaknesses of an instructional program, 
especially if there are attractive alternates. One such alternative is 
the substance of this study. 

Of the several things that should be taken into consideration when 
curricula are evaluated, student learning is among the most important. 

It seems obvious that a direct measure of student performance will be a 
better indicator of learning than teacher impressions. Indeed, there is 
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evidence that designing lessons on the basis of student performance data 
can result in substantially improved instruction. 

The BSCS, because of its participation in the effort to design 
instructional materials for this special purpose and population, is in a 
uniquely advantageous position to test the result, and, hopefully, to 
suggest to educators as a whole the criteria upon which success or failure 

in this effort may rest. 

This study, therefore, was directed toward the application of more 
effective evaluative techniques to assist in improving instruction for a 
significant fraction of the school population that has been consistently 
neglected. A major purpose of this study was to obtain reliable data on 
the effectiveness of the current materials in order to determine which 
procedures most improve the impact of these materials on problems of 
teaching the academically unsuccessful student significant ideas of 
modern biology. Subsidiary goals are to demonstrate the effectiveness 
of the overall design of these materials and to develop tests that could 
eventually be used by teachers for classroom evaluation of the 
academically unsuccessful student. Successful completion of this project 
may well provide a precedent and example which can be followed by other 
projects and for other materials to advance the cause of improved 
instruction for all students of the sciences. 

One reason for giving achievement tests is to evaluate students? that 
is, to rank them, to assign grades, or predict those likely to do well in 
college. When this is the purpose, it is appropriate to use the classical 
psychometric model. According to this model the likelihood of reliable 
discriminations between students is maximized when (1) the correlation of 
each item score with the total test score is maximized, and (2) the 
difficulty level of the items is as close to 50 percent as possible. When 
the full-dress treatment is given to the development of an achievement 
test, a large pool of items is tried out with a sample from the population 
for which the test is intended. Items that perform best in terms of the 
two criteria listed above are included in the final version of the test. 

A second reason for giving achievement tests is to evaluate the 
quality of instruction. Two somewhat different purposes may be 
distinguished. The first is "summative evaluation", so called because the 
purpose is to give a final test to a total instructional package, perhaps 
comparing it to competing programs, in order to provide potential 
consumers with information upon which to make a use decision. The second 
is "formative evaluation", wherein the purpose is to provide information 
to authors or teachers to help them improve the instructional program. 

Of these two goals, this study was concerned primarily with formative 

evaluation. 



l.C. Anderson, "Educational Psychology." Annual Review of 
Psychology 18 (1967) : 129-164. 

3 



o 



It is only in the last few years that it has become clear to 
educational researchers that the classical psychometric model is 
inappropriate for either summative or formative evaluation „ ' ' The 
procedures for selecting items dictated by the model cause the evaluator 
to discard items that most students answer correctly. Consequently, 
information about which concepts were well learned is lost . More serious , 
however, is the fact that items on which everyone does poorly are 
eliminated and, therefore, information about the weak points in the 
instructional program is systematically destroyed. The better the 
instruction that precedes the test the more likely the test is to contain 
tricky, hairsplitting questions on the footnotes rather than the main 
themes of instruction. This state of affairs follows from the logic of 
the model implying that the difficulty level of a test should be 50 
percent no matter how much and how well students have learned. Finally, 
the criterion that individual items should correlate highly with the 
total score biases the selection of items in the direction of those which 
measure relatively enduring student raits like verbal ability. At the 
same time, this criterion probably involves a bias against selecting 
items that are sensitive to immediate situational factors; for instance, 
whether the student has been subject to good or poor teaching. 

The objections to the classical psychometric model have been 
detailed here because the major course content improvement projects, 
including the Biological Sciences Curriculum Study, have uniformly 
developed achievement tests that are psychometrically "good." 

This study attempted to employ "criterion-referenced" tests. The 
sole basis for selecting a test item was whether the student's answers to 
the item would indicate the extent to which he understood an important 
concept (or could apply a problem-solving skill , use an experimental 
technique, etc.). There was no attempt to regulate the difficulty of 
items in advance of the research. The whole point of the research was to 
determine easy items (student learned and he now understands) and 
difficult items (the student did not learn and he did not understand) . 



2 . R. Glaser , "Instructional Technology and the Measurement of 
Learning Outcomes: Some Questions, " American Psychologist 18 (1963) : 
519-521. 

3. Richard C. Cox. and Julie S. Vargas, A Comparison of Item Selecting 
Techniques for Norm— ref erenced and Criterion— ref ere nced Tests , 
(Pittsburgh, Pennsylvania: Learning Research and Development Center, 
University of Pittsburgh, February, 1966). 

4. Ralph W. Tyler, Robert M. Gagne, and Michael Scriven, Perspectives 
of Curriculum Eval uation , American Educational Research Assn. 
Monograph Series, (Washington, D.C.: Rand McNally & Co., 1967 ). 
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Chapter 2 

Research Design and Analysis 



Research Design 

The course materials , Biological Science : Patterns and Processes , 
were divided into five areas of study: ecological relationships, cell 
energy processes, reproduction and development, genetic continuity, 
and organic evolution. Each area of study was analyzed for significant 
concepts that served as guides for developing test items. Items were 
sorted into two test forms, A and B, of equal length, with at least one 
item for each concept. In this way alternate forms for each of five unit 
tests were developed. Randomly selected students within each class were 
administered these alternate forms for each unit test as shown in Figure 1. 



Classroom 


Pretest 


Instruction 


Posttest 


Subgroups 


Form 




Form 


Subgroup 1 


A 


X 


B 


Subgroup 2 


B 


X 


A 



Figure 1. The Alternate Form, Pretest-Posttest Design 



This alternate form, pretest-posttest design, eliminated the 
facilitation of posttest performance of a single form design and had the 
additional advantage that data on twice as many item;, were obtained with 
the same investment of student time. The number of items per student is 
an important consideration when the purpose is to discriminate among 
students, but when the goal is to discriminate between the well-learned 
and not well-- learned concepts, the number of items becomes paramount. 

A control group was not required in that the purpose of the study was 
to identify the effects of instruction with particular materials so that 
the revision of materials and suggestions for teacher adaptation of 
materials could be accomplished. 

The five pairs of alternate form multiple -choice tests served as the 
dependent variables in the study. The Verbal Reasoning (VR) and Numerical 
Ability (NA) sub-tests of the Differential Aptitude Test (DAT) , Form A, 
and the Davis Reading Test (DRT) , Comprehension, and Speed Tests served 
as independent variables. In addition to the test data on the student 
population, community characteristics and school district size were 
secured through a teacher questionnaire and from published statistical 
data. 



l.See Chapter 4-7. 

2. Gerald Kahn and Warren Hughes, "Statistics of Local Public School 
Systems, 1967. Fall 1967: Pupils Schools/Staff. 1966-67: 
Expenditures." National Center for Educational Statistics, Government 
Printing Office, Washington, D.C.: Superintendent of Documents, 

March, 1969. 
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Analysis 



The data were subjected to statistical analysis on the CDC 6400 
computer at the University of Jolorado, Bouldor. 

The initial analysis was run early in June to provide data for the 
writing team to use in improving the Revised Edition of Patterns and 
Processes . Output from the initial run included the percent correct on 
the pretest and posttest, and the percent possible gain^ for the groups 
of items comprising each of the concepts on each test administered. 

Later, a complete item analysis was run on each test using the 
FORTAP (Fortran Test Analysis Package) program developed by Baker and 
Martin^ and modified by personnel of the Laboratory of Educational 
Research, Univeristy of Colorado, Boulder. Data obtained with this 
program included mean, standard deviation, standard error, and a Hoyt 
Reliability estimate for each test. In addition, difficulty (% correct) , 

R biserial, X 50-values and /-values were printed out for every response 
on every test item. 

The results of the initial FORTAP analysis were carefully scrutinized, 
and some items were eliminated on the basis of logical, factual, or 
structual errors in the item itself. Before any subsequent analysis was 
conducted, each correct response was given a weight of 4 to compensate for 
guessing and all tests, with "bad items" deleted, were rerun on the FORTAP 
program to yield more accurate reliability estimates. Punched output, 
including weighted (X4) scores for each item and final score, was obtained 
for each student. The cards with weighted scores were matched with cards 
containing DAT and DRT data. Only those students for whom complete data 
(pretest, posttest, DAT, and DRT) were available were used for the 
subsequent analysis. 

A factor analysis was run on each test to determine whether or not 
the items grouped in each concept were loading on similar factors 



Bmdujm lieneral Factor Analysis Program. The BMD03M performs a principal 
component solution and an orthogonal rotation of the factor matrix. 
Communalities were estimated from the squared multiple correlation 
coefficients (r ) . Output from the 03M included the mean and standard 
deviation of each variable, correlation matrix, Eigen-values including 
cumulative proportions of total variance , Eigenvectors , and a factor 
matrix. The Harris-Kaiser factor analysis program performed an oblique 



3. See Appendix C. 

4. F. B. Baker and T. J. Martin. FORTAP : A Fortran Test Analysis 
Package , Laboratory of Experimental Design, Wisconsin Research and 
Development Center for Cognitive Learning. The University of 
Wisconsin, March 1, 1968. 

5. W. J. Dixon, ed. , BMP Biomedical Computer Programs , (Berkeley: 
Univeristy of California Press , 1968) , pp . 169-184 . 




by computer programs. The raw data were processed by the 
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